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Abstract 

As neuroimaging databases grow in size and com- 
plexity, the time researchers spend investigating 
and managing the data increases to the expense of 
data analysis. As a result, investigators rely more 
and more heavily on scripting using high-level lan- 
guages to automate data management and process- 
ing tasks. For this, a structured and programmatic 
access to the data store is necessary. Web services 
are a first step toward this goal. They however 
lack in functionality and ease of use because they 
provide only low level interfaces to databases. We 
introduce here PyXNAT, a Python module that in- 
teracts with The Extensible Neuroimaging Archive 
Toolkit (XNAT) through native Python calls across 
multiple operating systems. The choice of Python 
enables PyXNAT to expose the XNAT Web Ser- 
vices and unify their features with a higher level 
and more expressive language. PyXNAT provides 
XNAT users direct access to all the scientific pack- 
ages in Python. Finally PyXNAT aims to be effi- 
cient and easy to use, both as a backend library to 
build XNAT clients and as an alternative frontend 
from the command line. 



Introduction 

The neuroimaging community is producing imag- 
ing and related data at an increasing rate. Pub- 
licly available data and consortia shared data fol- 
low the same trend as funding agencies more rou- 
tinely require some sharing from the grantees. The 
need to share and to maintain data resources at 
different scales, from large, multi-site studies to in- 



dividual laboratories or researchers, has also led 
to the development of neuroimaging data manage- 
ment systems (Van Horn and Toga, 2009). For in- 
stance, the USA-based Biomedical Informatics Re- 
search Network (BIRN) has during the past ten 
years developed a number of tools to facilitate col- 
laborative research and data sharing in neuroimag- 
ing. These efforts included the development or 
use of ontologies (Larson et al., 2009), data for- 
mat exchange (Gadde et al., 2012), as well as 
databases and data management systems includ- 
ing the Human Imaging Database (HID) (Kcator 
et al., 2008), the LONI IDA (Van Horn and Toga, 
2009), and The Extensible Neuroimaging Archive 
Toolkit (XNAT) (Marcus et al., 2007). Other con- 
sortia and initiatives have also emerged to facili- 
tate the handling and sharing of neuroimaging data 
such as Neurolog (Montagnat and Gaignard, 2008) 
and CABIG (Kakazu et al., 2004). Projects such 
as the ADNI (Petersen et al., 2010) make avail- 
able high quality large datasets to the community, 
and the number of large multi-modal databases 
is growing very fast (Van Essen, 2002; Van Horn 
and Toga, 2009). Numerous tools for managing 
the neuroimaging and associated data have been 
developed as a consequence of all these projects. 
Most of them are built for a specific consortium 
or laboratory, (Ozyurt et al., 2010; Montagnat and 
Gaignard, 2008) . Fewer are made to be distributed 
as a re-usable stand-alone system. Amongst those, 
XNAT is one of the most commonly used. It is in- 
stalled in many major institutions 1 and enjoys an 
increasing adoption in the community. 

Databases aim to organize data in a way that it 
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can be efficiently queried and stored. There are var- 
ious database models 2 (Maier, 1983; Cattell et al., 
i997; Angles and Gutierrez, 2008) that can be cho- 
sen to structure the data depending on the problem 
to solve. As the complexity and size of the data 
increases, neuroimaging databases become harder 
to use because they combine metadata, stored in 
the database itself, with images, stored in an un- 
derlying file system. Currently most users man- 
ually select and download data through graphical 
user interfaces (GUI), which is intuitive on a small 
scale, but becomes impractical and error prone on 
a large one. That is because downloading the data 
on a local disk before processing it breaks the way 
the data is structured in the database. The data 
has to be organized again, but locally, in a consis- 
tent layout of files and directories, which effectively 
duplicates the work done setting up the database. 
Futhermore metadata used to select the data of in- 
terest such as quality check variables are likely to 
change during the life span of the database, which 
makes it even harder to keep a local dataset syn- 
chronized and organized. For example, any man- 
ual operations to download the data would have 
to be entirely repeated to reprocess an up-to-date 
dataset. All these reasons explain why it is more 
and more necessary to directly interact program- 
matically with the database. Indeed, relying on a 
scripted interaction helps maintain consistency of 
accessed data and appropriate versions of the data. 
This new data flow is represented in Figure i. In 
short, large databases call for new ways of interact- 
ing with and analyzing the data. 

XNAT Neuroimaging data management systems 
in general, and XNAT in particular, must concur- 
rently solve several issues, depending on specifica- 
tions of the system. The most common challenges 
are: to make the data available in a sustained and 
secure manner, offer ways of searching and query- 
ing specific data, and finally to enable updating 
the data repository, either for curation or storing 
results obtained from processing tools. 

The XNAT system solved many of these prob- 
lems. Its core design feature is that it models 
the data through XML schemas, and automatically 
builds a relational database and a web interface for 
accessing the data using that formal description. 

2 http : //en. wikipedia. org/wiki/Database_model 



Many XML schemas are already available to de- 
scribe common neuroimaging or neuropsychological 
data. These schemas greatly ease the work of the 
data manager constructing the database. XNAT 
includes many useful features such as a permission 
and access rights system, tabular views, and search 
capacities. Based on this XML description, XNAT 
users can send queries and receive appropriate re- 
sults. XNAT has moreover developed a representa- 
tional state transfer Application Program Interface 
(REST API) to push and pull data from an XNAT 
database. This REST API enables software devel- 
opers and power users to programatically query the 
server in order to access the data and its associated 
meta. However, REST APIs are low-level inter- 
faces that require a significant amount of technical 
knowledge to perform basic operations. 

PyXNAT We developed a Python library (PyX- 
NAT) to unify the different REST resources for ac- 
cessing and providing the data to the database as 
well as ease scripting interactions with an XNAT 
database. Python has recently gained a strong mo- 
mentum in the neuroimaging data analysis com- 
munity, and more generally in ncuroscicncc. Com- 
bined with powerful scientific librairies, it is now 
close to providing a credible alternative to other 
high level platform independent interpreted lan- 
guage such as MATLAB™ , but without additional 
licence costs. It offers in addition a very large set of 
software engineering utilities such as XML parsing, 
database, and web interface modules. We discuss 
more in depth to the choice of Python in the next 
section. 

The code of PyXNAT was originally developed at 
Neurospin, (I2BM, CEA, France) in the context of 
the IMAGEN European project 3 (Schumann et al., 
2010) to help this consortium interact with the IM- 
AGEN database, we designed PyXNAT to be of 
general use for the neuroimaging community and 
licenced it under BSD-3 4 . The code is available 5 
online and its documentation and unit tests cover- 
age quality is kept high so that the programmers ex- 
ternal to the project can easily contribute. Indeed, 
PyXNAT has started to shift toward a community- 
based development. 

3 http:/ /www.imagen.eu 

4 http: / /www. opensource.org/licenses/BSD-3-Clausc 
5 http: / /packages. python. org/ pyxnat 
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The rest of this article is organized as follows. 
In the first section, we give some background infor- 
mation on the software components on which PyX- 
NAT is based. The second section describes the 
construction of the library and gives some use case 
examples. Last, we discuss possible limitations and 
conclude with future improvements. 

1 Material and methods 

We first review the different technologies and com- 
ponents leveraged by PyXNAT as a preliminary to 
discussing the implementation design. 

Python 

We chose to use the Python language since it enjoys 
a growing success in the neuroimaging and neuro- 
sciencc communities. Indeed, it has recently been 
subject of a special-topic issue in Frontiers in Neu- 
roinformatics entitled "Python in neuroscience". It 
is a multi-paradigm programming language (for ex- 
ample, it supports object-oriented, functional, and 
procedural programming) with a simple and con- 
sistent syntax. It benefits from very efficient open- 
source scientific packages for numerical computa- 
tion such as NumPy (Oliphant, 2006) and SciPy 
(Jones ct al., 2001), making it a viable alterna- 
tive or useful complement to other analysis tools 
such as MATLAB™. Its flexibility and concise 
syntax speeds the process of prototyping new algo- 
rithms and trying out existing softwares. Another 
strength of the language lies in the variety of its ap- 



plication fields, which cover both scientific (Lang- 
tangcn, 2011) and non-scientific — but relevant — 
domains such as database management and web 
development . 

Python defines a standard for database inter- 
faces, which is the Python DB-API (PEP 249 
6 ). PyXNAT acts as an interface to an XNAT 
database, but it is a Pythonic wrapping on a REST 
API rather than a database driver. As such, it does 
not really follow a specification based on standard 
database mechanisms and does not replicate opera- 
tions such as transactions, which are transparently 
handled by the XNAT underlying database. How- 
ever it follows some principles from the specifica- 
tion, if not always with the same semantics. As 
an example, the PEP 249 defines cursor objects 
as "[objects] used to manage the context of a fetch 
operation". In other words, these objects are re- 
sponsible for controlling the data fetching but do 
not do anything when instantiated. They instead 
rely on lazy loading 7 mechanisms that access the 
data only when it is needed. PyXNAT design re- 
uses this principle. 

XNAT 

Overview and key features 

XNAT (Marcus et al., 2007) is an open source 
software platform designed to manage neuroimag- 
ing and related data. An XML Schema 8 defines 

6 http : //www . python . org/dev/peps/pep-0249/ 
7 http: / /en. wikipedia.org/wiki/Lazy_loading 
8 http:/ /www. w3.org/XML/Schema 
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Figure 2: XML Schema for the result table 



the XNAT data model and is used to generate a 
database back-end and a web interface front-end. 
Using the XML Schema as an abstraction layer has 
several advantages. First, the schema provides a 
formal representation of the data in a standard for- 
mat and enables the definition of high-level rela- 
tions between concepts such as inheritance. Sec- 
ond, it makes XNAT extensible since the base 
schema can accommodate extra types for specific 
studies using custom external schemas. 

XNAT search engine 

XNAT features a powerful search engine with its 
own query language that enable users to search 
data across all the data types defined in the data 
model in a transparent manner. The query lan- 
guage is specified with an XML Schema docu- 
ment. It enables standard relational database op- 
erations such as projection and selection. The 
data is returned as a CSV or JSON table and 
can be customized by defining elements in the 
XML query document. The XML schema in Fig- 
ure 2 specifies how to format the results as tab- 
ular data. The root_element_name corresponds 
to the type of data rendered for each row of 
the table (e.g., xnat:subjectData), whereas the 
search_f ield elements defines the columns (e.g., 



xnat:subjectData/SUBJECTJD). The results of 
the query are therefore ready to be used in any pro- 
gram or spreadsheet software. The XML Schema 
in Figure 3 defines how to express search predi- 
cates for XNAT. The main element of the query 
is the criteria_set clement, which can nested 
with chilcLset elements in order to perform 
more complex queries (e.g., return subjects that 
are over 20 years old and left-handed, or sub- 
jects that are under 20 and right-handed). The 
criteria_set clement takes a method value which 
indicated which boolean operator to use (AND or 
OR). Each criteria_set is composed of a list 
of constraints, defined by a schema_field (e.g., 
xnat:subjectData/HANDEDNESS), a comparison 
operator, and a value. The query language there- 
fore enables usual operations such as criteria com- 
parison and nesting. 

To promote interaction between different users of 
the same database and help system administrators, 
the XNAT search engine provides a way to share 
queries for all or a subset of users. 

The REST model 

REST (Representational State Transfer) (Fielding, 
2000) is a generalization of the architectural prin- 
ciples of the World Wide Web and is used to de- 
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Figure 3: XML Schema for the search criteria 



velop web services alongside or as an alternative 
to other specifications such as the Simple Object 
Access Protocol or the Common Object Request 
Broker Architecture. A RESTful architecture iden- 
tifies a set of resources, which can be entities or 
collections, with standardized Uniform Resource 
Identifiers (URIs). The methods to interact with 
the resources rely on HTTP verbs — such as GET, 
PUT, POST and DELETE— that are mapped to 
resource-specific semantics. This means that re- 
sources map to a set of views to represent the data 
state on the server independently from the way it 
is stored. REST also allows representing the re- 
sources content in different formats (e.g., XML, 
HTML, and plain text). 

XNAT uses URIs' generic syntax, which consists 
in a sequence of component parts describing the 
communication protocol, the resource location, and 
additional information. An example inspired from 
the RFC 3986 9 summarizes the syntax: 

http : //central . xnat . org/REST/projects?f ormat=csv 

\__/ \ / \ / \ / 

II II 
scheme authority path query 

RESTful architectures organize resources in a hi- 
erarchy. Basically, URIs' paths are constructed us- 

9 http:/ /www. ietf.org/rfc/rfc3986.txt 



ing a fixed set of keywords that have parent-child 
relations. In XNAT, the main concepts follow the 
tree structure represented in Figure 4. Keywords 
are paired with an ID to point to a specific resource. 
Collection resources return a list of identifiers and 
do not end with an ID. Table 1 illustrates how 
XNAT uses the REST resources to list the project 
names on a server and access a specific one. 



Resource type Path 

element resource /REST/projects/PROJID 
collection resource /REST/projects 



Table 1: XNAT URI design 



URIs support a range of operations through 
the HTTP verbs. Collection resources typically 
only support the GET method whereas element re- 
sources use GET, PUT and DELETE to support 
access, creation, and deletion operations. To per- 
form additional operations, XNAT leverages the 
query component of URIs. As shown in Table 2, it 
enables selecting and filtering the outputs as well 
as choosing the output format. 
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Figure 4: XNAT REST model 



Option URI query string 

select output ?columns=ID, project 

filter output ?xsiType=xnat:mrSessionData 

output format ?format=csv 



Table 2: URI query strings usage in XNAT 

The XNAT REST API is separated in two parts: 
the hierarchical structure described on Figure 4 and 
the search engine. Navigation through the database 
and downloading files attached to subjects or exper- 
iments is accessible from the URIs whereas getting 
tables containing metadata is enabled by the search 
engine. 

2 Results 

We implemented the PyXNAT package on top of 
the XNAT REST API to enable easy communica- 
tions with XNAT through the Python language. In 
this section, we describe the general design of the 
library as well as specific mechanisms that are origi- 
nal or of particular importance. We finish by giving 
some examples of real life uses cases for PyXNAT. 

Architecture and design 

PyXNAT combines several components to interact 
with an XNAT server, which are described in Fig- 
ure 5. Its core relies on the httplib2 Python mod- 
ule, which is in charge of issuing calls to XNAT. 
The REST structure itself, which is described Fig- 
ure 4 is static and cannot be discovered with the 
XNAT REST API. This is why PyXNAT uses a 



configuration file to model the REST structure and 
to generate a programming interface that maps the 
REST API to Python objects and methods. The 
modelling of the REST API is used to generate 
HTTP calls against XNAT as well as parsing the 
responses to generate Python objects. 

The XNAT REST API is composed of a set of 
hierarchical resources, which is the REST structure 
itself, and an endpoint for its search engine. The 
REST structure gives access to files (e.g., images), 
as well as metadata. The URIs look a lot like files 
and directories paths on a file system and can effec- 
tively be viewed as such. The search engine gives 
access to the metadata and enables searching them, 
but does not provide any mechanism to point to 
files. So with the REST API, it is possible to look 
for all the subjects that are 14 years of age or have 
a specific answer to an assessment. But the search 
will not be able to yield URIs to files. PyXNAT 
builds on XNAT by using the results of the search 
engine to subsequently generate URIs and retrieve 
files. By tightly integrating those two mechanisms, 
PyXNAT delivers a more powerful and succint way 
to interact with XNAT. For example, PyXNAT is 
able to retrieve all the assessments from a subject 
in a single statement whereas the REST API from 
XNAT would require several calls. 
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Figure 5: PyXNAT architecture 
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Object Mapper 

PyXNAT borrows language elements from SQL 
(Structured Query Language) and the Python DB- 
API to define a familiar and easy to use query lan- 
guage. The select statement defines the data to 
return from a query, either a list of identifiers if it is 
a collection object, or a Python object point- 
ing to a single URI if it is an element object. 
An object mapper returns Python objects reflect- 
ing URIs, and offering actions through their meth- 
ods. These actions include, resource operations, 
database browsing, and files downloading and up- 
loading. 

interface . select ( '/ projects ' ) 
interface . select ( '/ project /PROJID ' ) 

The element objects share common operations 
for insertion and deletion for example, but also fea- 
ture specific methods. For example all types of 
entities can be created and deleted, but only the 
project objects may handle the access permissions: 

p = i nt er face . s e 1 e c t ('/ proj ect /PROJID ' ) 

p. insert () 
p. delete () 
p. exists () 

p. set_accessibility( 'public ') 

Collection objects use a lazy loading mecha- 
nism and work essentially the same way as cursor 
objects as defined in Python DB-API. The object 
itself doesn't issue any request on the database and 
delegates the actual query to dedicated methods. 
The table 3 summarizes and compares the collec- 
tion or cursor objects methods. 



Python DB-API pyxnat-API 

fetchone() first() or fetchone() 

fctchmanyO not-supported 

fetchallQ get() or fetchallQ 



Table 3: Cursor objects methods comparison. 

PyXNAT container objects are implemented as 
Python generators. Generators provide a conve- 
nient mechanism for lazily looping over items (e.g., 
using a for loop) without accessing or loading 
those items until they are needed. For example, 



to perform an operation on every subject from all 
projects, PyXNAT operates on the subjects from 
each project one at a time. Without the lazy access 
mechanism, the library would have to retrieve all 
the subjects from all the projects before operating 
on them. This might take a considerable amount 
of time that can be used instead to start operations 
that needed the subjects in the first place. While 
there are other ways to iterate over subjects, this 
example demonstrates the flexibility introduced by 
the PyXNAT library: 

s = interface. select ('/projects/*/ subjects' 
) 

for subject in s: 

<perform operation> 

The keywords used in the select statement are 
the same as the ones defining the REST structure 
represented in Figure 4. The ability to chain those 
keywords through the Python objects enable users 
to express more complex queries very easily. For 
example, the following calls, which are equivalent, 
return all the files for all the experiments related 
to a subject in any project in the database. Of 
course, it would be as easy to use identifiers or more 
constrained filters instead of the wildcard "*" in 
order to return a specific set of files. The different 
syntaxes all rely on the same underlying Python 
objects. They exist because there are two different 
ways to use PyXNAT. First, as a library, which 
calls for efficiency and enables to reference directly 
the data. Second, as an interactive command line 
frontend for XNAT, in which case performance is 
not the main concern. The need to quickly explore 
the database is, however, far more important and 
explains the introduction of shortcuts in the syntax. 

interface . select . projects () . subjects () . 
experiments () . resources () . files () 

interface . select ( '/ projects /*/ subjects /*/ 
experiments/*/ resources /*/ files ') 

interface . select ( '//experiments// files ') 



Search integration 

XNAT's search engine sets up queries using the 
dataypes defined in the XML schemas. It is ac- 



7 



pyXNAT 



XNAT 



search files 



search generator 



request generator 




return files 

Figure 6: PyXNAT integration of the search engine and the files access 



cessible from a single UPJ, on which it is possible 
to POST — basically send — an XML file describing 
the query. The request gets results in form of a 
CSV (comma-separated values) table, which con- 
tains the requested data and identifiers. However, 
the URIs referencing files cannot be returned be- 
cause they are not stored in the database. More- 
over, the REST API does not provide any mecha- 
nism to use the identifiers to build URIs referencing 
files. PyXNAT deals with the complexity of writing 
the XML documents and offers a simple language 
to use the search engine. It also parses the out- 
puts from the search engine to generate valid URIs 
that get resources on XNAT. Thoses mechanisms 
are illustrated in Figure 6. 

To keep the semantics consistent throughout the 
API, PyXNAT uses again its select statement to 
define the data to capture, but with different pa- 
rameters. The first argument is specifies the type 
of an entry, and the second argument is a list of 
fields and defines the columns of the table to return. 
The SQL where clause is replicated in PyXNAT to 
formulate the search criteria. The criteria set is 
expressed as a list of tuples, where each tuple cor- 
responds to a single search constraint. A constraint 
tuple is a 3- value entity composed of a search field, 
a comparison operator and a value. Every query 
may include sub-queries, expressed by lists of tu- 
ples. The whole request with PyXNAT is therefore 
expressed with an SQL-like syntax which is close to 
other query languages and enables to fully leverage 
the XNAT search engine. This syntax is close to 
SQLAlchemy's 10 , a popular Python ORM library 

10 http:/ /www. sqlalchemy.org 



(Object-relational Mapping) 11 . 

# PyXNAT search example 
row = ' xnat : mrSessionData ' 
columns = [ 

'xnat : subj ect Data /LABEL ' , 
'xnat : mrSessionData/AGE' , 
'xnat : subject Data /GENDER' 

] 

criteria = [ 

( 'xnat : mrSessionData/PROJECT ' , 

'=' , 'MYJPROJECT ' ) , 
( 'xnat : mrSessionData/PROJECT" , 

'=', 'CENTRAL_OASIS_CS') , 
'OR' 

] 

interface . select (row , columns ) . where ( 
criteria) 

# SQLAlchemy example from the online 

documentation 
session . query(User) . filter (User . name . 1 i k e ( ' 

%ed ' ) ) 

The same syntax can be used to combine the 
search engine with the hierarchical REST re- 
sources. The select statement from the object 
mapper can be chained with the where clause, 
which uses the search engine. PyXNAT returns ob- 
jects, for which subjects match the criteria defined 
in the where clause. 

interface . select ( '//experiments ') . where ( 
criteria) 



1 1 http : //en . wikipedia . org/wiki/Ob j ect- relat ional 
mapping 
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Database introspection 

XNAT databases contain several kinds of data such 
as imaging data, demographic information, behav- 
ioral data, and experimental design. These cate- 
gories are further differentiated; for instance, imag- 
ing data may be acquired from several different 
modalities including structural and functional MM 
as well as PET. Further complicating the situa- 
tion, these image modalities are often referred to by 
several different names. For example, a database 
may reference a Tl-weighted image as just "Tl", 
whereas another one reference it as "MPRAGE". 
This terminology problem can be adressed with on- 
tologies and data integration technologies which are 
currently being developed by the community (Bug 
ct al., 2008; Larson et al., 2009). However as a 
first step adressing this problem, PyXNAT provides 
functions to retrieve list of all values used in a given 
database. This functionality gives users the ability 
to interact with the data and the data model, in or- 
der to quickly provide a summary of a large number 
of data types and entries. 

XNAT provides basic introspection methods that 
are replicated and augmented in PyXNAT. In par- 
ticular, XNAT provides REST functions to query 
the data types that are defined by the XML 
Schema. These functions enable users to learn from 
a database, for example, that it defines the concept 
of subject and that a subject has a gender or tha the 
age of the subject is actually defined for an exper- 
iment performed on this subject. However, XNAT 
lacks a helper function to extract all the values that 
a data field takes in a specific database. To provide 
this functionality with a consistent API, PyXNAT 
uses the search engine from XNAT. This is very 
useful when building queries, since it provides a 
list of all values that can be used. In the interac- 
tive session below, we show the different methods 
that enables users to explore the data model and 
find data: 

# retrieve list of datatypes 

»> interface . inspect . datatypes () 
[ . . . , 'xnat : mrSessionData ' , . . . ] 

# retrieve list of datatypes fields 
»> interface . inspect . datatypes( 'xnat : 

mrSessionData ' ) 
[ . . . , ' xnat : mrSessionData /AGE ' , . . . ] 

# retrieve list of field values 

»> interface . inspect . field_values ( 'xnat : 
mrSessionData /AGE ' ) 



'14 ' , '25 ' , '42 



Cache 

PyXNAT maintains a local copy of all the server re- 
sponses into a cache (i.e., the data files that may be 
images as well as the metadata arrays or resources 
listing) . The main goal of the cache is to improve 
performance; but, it can also be rendered persistent 
and provide a full "offline" mode to PyXNAT. 

The PyXNAT cache is primarily an implementa- 
tion of the HTTP caching mechanism that stores 
the data on a filesystem. HTTP is known as 
a request-response protocol, which means that a 
client sends a request to a server that is responsible 
for processing and returning a message in response. 
The message is composed of two main parts: the 
header and the body. The header contains informa- 
tion on the message and on the server. The body 
contains the actual data that was requested (e.g., 
an image). HTTP provides cache validators in the 
header of the messages to transmit the status of the 
resource to the client, which can take a decision on 
the validity of the cached version of the resource. 
This strategy prevents the client from downloading 
unnecessary data and improves performance by re- 
ducing the network traffic. XNAT only provides the 
"Last-Modified" field in the header, which can be 
checked against the date of the local version of the 
data. Only the resources that link to a file support 
the cache validation in XNAT. The other resources 
— elements listing, metadata values — need to be 
downloaded again to make sure the local data is 
up to date. This is why PyXNAT introduces an 
additional expiration mechanism to avoid repeat- 
edly requesting resources to the server for certain 
operations. In other words, if a cached resource is 
accessed within a specified amount of time (default 
to one second), the data will not be downloaded 
again. 

Database management 

PyXNAT supports additional XNAT functionali- 
ties including user, project, and pipeline manage- 
ment as well as search utilities. Those features re- 
flect exactly what is provided by the XNAT REST 
API. We now present two critical interfaces. 
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The first interface provides project management 
functionality. It enables project owners to configure 
their project, add users, and set up access permis- 
sions. This is achieved mainly by configuring two 
attributes: the user role and the project accessibil- 
ity. 

interface . manage . project ( ' ID ' ) . 

set_accessibility ( level ) 
interface . manage . project ( ' ID ').add_user(' 

user ' , role ) 



The second interface is the search utility. It en- 
ables users to create and share searches with other 
users. It uses the same syntax as the one described 
for the where clause. An additional PyXNAT- 
specific feature is the ability to create search tem- 
plates. These templates maintain the ability to be 
shared between users, but instead of carrying val- 
ues, they define keys to be replaced by the actual 
value when used. This makes it possible to easily 
re-use any kind of search. 

criteria = [ ( ' xnat : subjectData/GENDER' , '=' 

, 'male ' ) , 'AND' ] 
interface . manage . search . save ( 'search_name ' , 

row, columns, criteria , users) 
interface . manage . search . get ( 'search_name ' ) 

criteria = [(' xnat : subjectData/GENDER' , '=' 

, ' gender ' ) , 'AND' ] 
interface . manage .search, save .template ( ' 

template_name ' , row, columns, criteria , 
users ) 

interface . manage . search . use_template( ' 
template_name ' , { 'gender ' : 'male ' }) 



Usage examples 

PyXNAT is a powerful and easy to use library to 
build client applications for XNAT. As an example, 
NiPyPE (Ghosh ct al., 2010) is a Python module 
that interfaces to existing neuroimaging software 
such as SPM, FSL or FreeSurfer. It is also able to 
distribute jobs over clusters, which makes it very 
efficient to process large amounts of data. Its data 
connection method was originally file system based 
but it can now in addition access an XNAT server 
through PyXNAT. PyXNAT and NiPype are being 
used jointly to run analysis on IMAGEN, which is 
a European project that aims to study addiction 



risk factor in a database containing over 2000 ado- 
lescents. Figure 8 depicts how the two packages 
interact. 

Other projects have started or already support 
XNAT through PyXNAT. Among them arc the 
XNAT tools from the XNAT group, which were 
originally written in Java and are currently being 
re- written in Python using PyXNAT. Another ex- 
ample is the Connectome Viewer (Gerhard et al., 
2011), which can now read and write data on 
XNAT. 



3 Discussion and Conclusion 

Historically, neuroimaging researchers have used 
ad-hoc procedures for maintaining their analysis 
and data. Over the last few years, there have been 
several attempts to build databases to manage neu- 
roimaging data. The ability to programmatically 
access neuroimaging databases is becoming increas- 
ingly important to perform batch analysis and ad- 
ministration tasks, as their growth makes them all 
but impossible to operate manually. However it 
has been difficult to use standard analysis tools and 
these database systems together. 

One of the most widely used neuroimaging 
database systems is XNAT. XNAT is an open- 
source database that incorporates many useful and 
powerful features including an efficient search en- 
gine and a REST API. However, being written in 
Java and having a REST API, it offers no natural 
bridge to the most common analysis tools, that are 
accessible from a scripting language (like Matlab or 
Python) or from the command line. 

PyXNAT provides a bridge between XNAT 
and analysis tools. Combined with an interac- 
tive Python terminal such as IPython (Perez and 
Granger, 2007), it can also be used as an alter- 
native front-end for XNAT. Since it is written in 
Python, it becomes readily accessible to the vast 
and relevant set of Python tools in the neuroscience 
domain. Moreover, command line tools can easily 
be developed using PyXNAT, as such, popular pro- 
gramming languages can easily benefit from PyX- 
NAT merely by issuing system calls. The XNAT 
team is currently rewriting its command line tools 
using PyXNAT. Most programming languages pro- 
vide bridges toward other languages. As an ex- 
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HTTP request without cache 



GET /imagesfTl.nii.gz 




server 



HTTP request with cache 



GET /images/Tl.nii. gz 




Figure 7: HTTP cache mechanism 




Figure 8: PyXNAT and NiPyPE interactions 



ample, PyMat 12 enables Python scripts to exe- 
cute Matlab commands, and pass back and forth 
Numpy arrays. While these solutions are not as ro- 
bust as keeping a single programming environment, 
they provide a viable option to use existing code. 
Another solution to use PyXNAT from other lan- 
guages is to use the softwares wrapped by NiPyPE. 

PyXNAT focuses on ease of use, combining 
RESTful services with clear semantics and adding 
helper features. It also makes PyXNAT highly ef- 
ficient: being a thin layer over HTTP with a cache 
mechanism, it is at least as efficient as native REST 
calls. The package is open-source under the BSD 
license. It is available for download 13 and has an 
online documentation 14 , which covers installation 
and usage. 

There are several areas where PyXNAT, in its 
current version (0.9.3), needs to be improved. One 
of the most important areas that could be im- 
proved is the PyXNAT cache, which is currently 
only disk-based. If several processes share the same 
cache folder, one has to be careful to avoid concur- 
rent read and write operations on the same files. 
The cache could be replaced by a full featured lo- 
cal database. It would support concurrent access 

12 http:/ /claymore. engineer. gvsu.edu/ steri- 
ana/Python/pymat.html 

13 http: / /pypi. python. org/ pypi/pyxnat / 
14 http: / /packages, python, org/ pyxnat / 
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and also enable an offline mode for PyXNAT with 
search capabilities. One could also add synchro- 
nization features to update the local database and 
push back generated results to the remote server. 
Users would then be able to work seemlessly on 
and offline. Other possible improvements include 
a logging framework to trace all the REST calls, 
advanced data filtering capabilities from the REST 
API, or the prearchive mechanism from XNAT. 

PyXNAT could be used to develop a federa- 
tion layer between XNAT servers. It would mostly 
help to access the data, but using its introspection 
functions, it could issue simple queries on multi- 
ple XNAT instances. PyXNAT could also help to 
federate heterogeneous databases systems, but as 
a component along similar librairies. The com- 
plex challenges of data integration, such as data 
alignment would however have to be addressed 
separately. The INCF (International Neuroinfor- 
matics Coordinating Facility), and in particular 
its datasharing task force, is currently working on 
these issues. For example, the datasharing task 
force is working on an API for accessing differ- 
ent ncuroimaging databases (XNAT, HID, IDA, 
...), that could eventually be re-used in PyXNAT. 
Its goal is not to promote a specific database, but 
rather standards and methods to share and re-use 
neuroimaging data. However due to its popularity 
and relevance, XNAT and therefore PyXNAT are 
part of the components being used to build proto- 
types for the initiative. 

In conlusion, PyXNAT enables XNAT access in 
the Python environment. It can be used both as an 
interactive command line interface and as a back- 
end communication library. We see PyXNAT as 
an major step to help process and administrate 
datasets in XNAT servers. 
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5 Appendix 



This small example illustrates how to download Tl images from subjects over 80 years old on XNAT 
Central 15 with PyXNAT, and process them in parallel on a computer. For the sake of simplicity, we 
chose the BET command line tool, which extracts the brain from the image of the whole head, as an 
analysis example. The functions, that distribute the processing on several processors, are part of the 
standard library of Python. The example is also available on github 16 , and requires FSL 1 ' and pyxnat 
version 0.9.3 or above to run. The script will prompt the user for a login and password, so one may 
need to first register on XNAT CENTRAL. 



import os 

from subprocess import Popen 
import multiprocessing as mp 

import pyxnat 

URL = ' https : // central. xnat. org' # central URL 

BET = ' fsl4 .l-bet2 ' # BET executable path 

central = pyxnat . Interface (URL) # connection object 

def bet(in_img, in_hdr): # Python wrapper on FSL BET, essentially a system call 
in_image = in_img.get() # download . img 

in _hdr . get ( ) # download . hdr 

path, name = os . path . s p 1 i t ( in_image ) 
in.image = os.path.join( path , name .rsplit('.')[0]) 
out -image = os. path . j o i n ( path , name .rsplit('.')[0] + '-brain') 
print '=> %s ' % in_image [ - 120:] 
Popen ( '%s %s %s ' % (BET, in.image , out_image) , 

s h e 1 l=True ) . communicate ( ) 
return out_image 

notify = lambda m: sys.stdout.write( '<= %s \n ' % m[ — 12 0:]) # print finish message 
pool = mp. Pool ( processes=mp. cpu-COunt ( ) * 2) # pool of concurrent workers 
images = {} 

query = ( '/ pr o j e c t s /CENTRAL_OASIS_CS/ s u b j ec t s /* ' 

'/experiments/ * JMR1 /scans / mpr— l*/resources/*/files /* ') 
filter. = [( 'xnat : mrSessionData/AGE ' , '> ' , '80'), 'AND'] 

for f in c e nt r al . s e le c t ( query ). where ( f i 1 1 e r _) : 
label = f . label () 

# images are stored in pairs of files (.img, .hdr) in this project 
if label . endswith ('. img ') : 

images. setdefault( label, split ('. ')[0] , []) . append ( f ) 
if f . label (). endswith ('. hdr ') : 

images. setdefault( label, split ('. ')[0] , []) . append ( f ) 

# download and process both occur in parallel within the workers 
for name in images . keys () : 

if len ( images [name] ) = 2: # if .img and .hdr XNAT references are ready 
img, hdr = images . pop (name) # get references 

pool . apply _async ( bet , (img, hdr), callback=notify ) # start worker 

pool . close () 
pool . j o i n ( ) 



15 https:/ /central. xnat. org 
16 https://gist.github.com/1816347 

17 fsl-bet path may have to be changed in the script to match your installation 
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